Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points
نویسندگان
چکیده
We present a diagnostic evaluation platform which provides multi-factored evaluation based on automatically constructed check-points. A check-point is a linguistically motivated unit (e.g. an ambiguous word, a noun phrase, a verb~obj collocation, a prepositional phrase etc.), which are pre-defined in a linguistic taxonomy. We present a method that automatically extracts check-points from parallel sentences. By means of checkpoints, our method can monitor a MT system in translating important linguistic phenomena to provide diagnostic evaluation. The effectiveness of our approach for diagnostic evaluation is verified through experiments on various types of MT systems.
منابع مشابه
Introduction to China’s CWMT2008 Machine Translation Evaluation
This paper presents an overall introduction to the CWMT2008 evaluation and focuses on its two new metrics: BLEU-SBP (Chiang et al., 2008) and linguistic check-point method (Zhou et al., 2008). BLEU-SBP is a revised BLEU with strict brevity penalty. Our experiments validated BLEU-SBP’s effectivity in resolving the nondecomposability problem of both NIST-BLEU and IBMBLEU at sentence level. Lingui...
متن کاملWoodpecker: An Automatic Methodology for Machine Translation Diagnosis with Rich Linguistic Knowledge
Different from the “black-box” evaluation, the diagnostic evaluation aims to provide a better explanatory power into various aspects of the performance of artificial intelligence systems. However, for machine translation (MT) systems, due to its complexity and knowledge dependency, such diagnostic evaluation often demands a large amount of manual work. To tackle this problem, we propose an auto...
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملA Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints
This paper describes an approach to the diagnostic evaluation of machine translation (MT) based on linguistic checkpoints, which can provide valuable information both to the developers and to the end-users of MT systems. We present a flexible framework and a new tool, DELiC4MT, for fine-grained diagnostic MT evaluation which can be extended to any language pair and applied to any evaluation tar...
متن کاملDELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena
This paper demonstrates DELiC4MT, a piece of software that allows the user to perform diagnostic evaluation of machine translation systems over linguistic checkpoints, i.e., sourcelanguage lexical elements and grammatical constructions specified by the user. Our integrated tool builds upon best practices, software components and formats developed under different projects and initiatives, focusi...
متن کامل